A Hybridized Deep Learning Method for Bengali Image Captioning
نویسندگان
چکیده
An omnipresent challenging research topic in com-puter vision is the generation of captions from an input image. Previously, numerous experiments have been conducted on image captioning English but caption Bengali still sparse and need more refining. Only a few papers till now worked Bengali. Hence, we proffer standard strategy for two different sizes Flickr8k dataset BanglaLekha which only publicly available captioning. Afterward, our model were compared with generated by other researchers using architectures. Additionally, employed hybrid approach based InceptionResnetV2 or Xception as Convolution Neural Network Bidirectional Long Short-Term Memory Gated Recurrent Unit datasets. Furthermore, combination word embedding was also adapted. Lastly, performance evaluated Bilingual Evaluation Understudy proved that proposed indeed performed better consisting 4000 images dataset.
منابع مشابه
Deep Learning for Automatic Image Captioning in Poor Training Conditions
English. Recent advancements in Deep Learning show that the combination of Convolutional Neural Networks and Recurrent Neural Networks enables the definition of very effective methods for the automatic captioning of images. Unfortunately, this straightforward result requires the existence of large-scale corpora and they are not available for many languages. This paper describes a simple methodo...
متن کاملContrastive Learning for Image Captioning
Image captioning, a popular topic in computer vision, has achieved substantial progress in recent years. However, the distinctiveness of natural descriptions is often overlooked in previous work. It is closely related to the quality of captions, as distinctive captions are more likely to describe images with their unique aspects. In this work, we propose a new learning method, Contrastive Learn...
متن کاملDeep Learning for Video Classification and Captioning
Accelerated by the tremendous increase in Internet bandwidth and storage space, video data has been generated, published and spread explosively, becoming an indispensable part of today's big data. In this paper, we focus on reviewing two lines of research aiming to stimulate the comprehension of videos with deep learning: video classification and video captioning. While video classification con...
متن کاملStack-Captioning: Coarse-to-Fine Learning for Image Captioning
The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multistage prediction framework for image captioning, composed of multiple decoders each of which...
متن کاملLearning to Guide Decoding for Image Captioning
Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Advanced Computer Science and Applications
سال: 2021
ISSN: ['2158-107X', '2156-5570']
DOI: https://doi.org/10.14569/ijacsa.2021.0120287